Discovering Empirically Conserved Amino Acid Substitution Groups in Databases of Protein Families
نویسندگان
چکیده
This paper introduces a method for identifying empirically conserved amino acid substitution groups. In contrast with existing approaches that view amino acid substitution as a pairwise phenomenon, the method presented here identifies conserved groups of amino acids using a data structure called a conditional distribution matrix. The conditional distribution matrix extends the concept of a pairwise substitution matrix by changing the context of substitution from a single amino acid to a group of amino acids. The matrix tabulates information from a database of protein families that contains numerous aligned positions. Each row in the matrix contains the distribution of amino acids in those aligned positions that contain a given conditioning group of amino acids. The method converts a database of protein families into a conditional distribution matrix and then examines each possible substitution group for evidence of conservation. The algorithm is applied to the BLOCKS and HSSP databases. Twenty amino acid substitution groups are found to be conserved empirically in both databases. These groups provide insight into biochemical properties that are conserved in protein evolution.
منابع مشابه
Theoretical Determination of Amino Acid Substitution Groups based on Qualitative Physicochemical Properties
This paper introduces a novel method for theoretical determination of amino acid substitution groups. The method here involves making a binary matrix based on 48 qualitative physicochemical properties and calculating a substitution matrix based on this using dot products. Isolated groups with high scores are determined to be valid substitution groups and conserved groups are derived from these ...
متن کاملAmino acid substitutions preserve protein folding by conserving steric and hydrophobicity properties.
We present a comprehensive analysis of amino acid substitution patterns (sets of residues in a position of a multiple alignment) and conservation of physicochemical properties in alignments of protein sequences. Of the one million possible substitution patterns, only a few hundred account for the majority of aligned positions. Very similar distributions of substitution patterns are observed in ...
متن کاملSequence and evolutionary analysis of the human trypsin subfamily of serine peptidases.
Serine peptidases (SP) are peptidases with a uniquely activated serine residue in the substrate-binding site. SP can be classified into clans with distinct evolutionary histories and each clan further subdivided into families. We analyzed 79 proteins representing the S1A subfamily of human SP, obtained from different databases. Multiple alignment identified 87 highly conserved amino acid residu...
متن کاملSequence patterns derived from the automated prediction of functional residues in structurally-aligned homologous protein families
MOTIVATION Most proteins have evolved to perform specific functions that are dependent on the adoption of well-defined three-dimensional (3D) structures. Specific patterns of conserved residues in amino acid sequences of divergently evolved proteins are frequently observed; these may reflect evolutionary restraints arising both from the need to maintain tertiary structure and the requirement to...
متن کاملModeling protein families using probabilistic su x trees
We present a method for modeling protein families by means of probabilistic suux trees (PSTs). The method is based on identifying signiicant patterns in a set of related protein sequences. The input sequences do not need to be aligned, nor is delineation of domain boundaries required. The method is automatic, and can be applied, without assuming any preliminary biological information, with surp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Proceedings. International Conference on Intelligent Systems for Molecular Biology
دوره 4 شماره
صفحات -
تاریخ انتشار 1996